Skip to main content

Web Collection 2012

Web crawl data from the year 2012. Some of this data is currently not publicly accessible.

192,243
RESULTS
rss


web web 192,243

PART OF
Web Collections
Web Crawls

TOPIC atoz
crawldata 152,946
wiki 3,503
dumps 3,495
incremental 3,494
media 3,494
tape 3,494
Wikipedia 1,372
Wikibooks 398
Wikiquote 330
2011 273
Wikimedia 145
Wikinews 144
English 98
Czech 35
German 35
Greek 35
Italian 35
Russian 35
Spanish 35
Swedish 35
Arabic 34
Finnish 34
French 34
Bosnian 30
Catalan 30
Chinese 30
Korean 30
Polish 30
Serbian 30
Tamil 30
Turkish 30
Hebrew 29
Persian 29
Dutch 25
Latin 25
Marathi 25
Slovak 25
Telugu 25
Thai 25
Welsh 25
Danish 21
Basque 20
Bengali 20
Breton 20
Hindi 20
Kannada 20
Kurdish 20
Kyrgyz 20
Uzbek 20
Urdu 17
Amharic 15
Faroese 15
Kazakh 15
Khmer 15
Malay 15
Nepali 15
Occitan 15
Punjabi 15
Sinhala 15
Tagalog 15
Tajik 15
Tatar 15
Wolof 15
Yiddish 15
groklaw 12
Aymara 10
Burmese 10
Chuvash 10
Cornish 10
Fijian 10
Guarani 10
Hausa 10
Ido 10
Inupiaq 10
Irish 10
Lao 10
Latvian 10
Lingala 10
Lojban 10
Maltese 10
Manx 10
Maori 10
Nauru 10
Oriya 10
Oromo 10
Quechua 10
Sakha 10
Samoan 10
Sango 10
Somali 10
Swahili 10
Swati 10
Tsonga 10
Tswana 10
Turkmen 10
Walloon 10
Zhuang 10
Zulu 10
Divehi 9
Pashto 9
Sindhi 9
Uyghur 9
LANGUAGE
SHOW DETAILS
up-solid down-solid
eye
Title
Date Archived
Creator
Wide Crawl started January 2012
web
eye 3.7M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl423.us.archive.org:wide from Tue Jan 17 08:02:53 PST 2012 to Tue Jan 17 01:16:20 PST 2012.
Topic: crawldata
vkontakte.ru
web
eye 2M
favorite 0
comment 0
Source: vkontakte.ru
Wide Crawl started January 2012
web
eye 1.6M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Sat Jan 21 04:01:50 PST 2012 to Fri Jan 20 21:01:34 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 716,975
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Tue Jan 3 12:37:06 PST 2012 to Tue Jan 3 06:31:55 PST 2012.
Topic: crawldata
Japan Earthquake
web
eye 696,725
favorite 0
comment 0
recurrence=NONE, maxDuration=259200, maxDocumentCount=null, isTestCrawl=false, seedCount=507, accountId=156, organizationName="Virginia Tech: Crisis, Tragedy, and Recovery Network", collectionId=2438, collectionName="Japan Earthquake"
Wikipedia Outlinks February 2012
web
eye 590,580
favorite 0
comment 0
Internet Archive crawldata from wikipedia outbound links. captured by crawl435.us.archive.org:wpo from Thu Mar 1 20:56:37 PST 2012 to Thu Mar 1 14:19:48 PST 2012.
Topic: crawldata
The Archive Team Just In Time Grabs
web
eye 569,262
favorite 0
comment 0
Live Web Proxy Crawls
web
eye 551,849
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-gen1.us.archive.org from 2012-07-20T21:59:15 UTC to 2012-07-21T06:44:52 UTC.
Topic: crawldata
Wide Crawl started September 2012
web
eye 525,686
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl337.us.archive.org:wide from Wed Oct 17 08:14:47 PDT 2012 to Wed Oct 17 02:41:59 PDT 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 523,727
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Sat May 12 21:38:20 PDT 2012 to Sat May 12 18:43:53 PDT 2012.
Topic: crawldata
recurrence=NONE, maxDuration=259200, maxDocumentCount=null, isTestCrawl=false, seedCount=12, accountId=593, organizationName="Government Printing Office", collectionId=3142, collectionName="CFPB"
Alexa Crawls
by thumper2.php
web
eye 463,277
favorite 0
comment 0
Alexa crawl
Topic: crawldata
recurrence=WEEKLY, maxDuration=259200, maxDocumentCount=null, isTestCrawl=false, seedCount=3, accountId=575, organizationName="Chicago-Kent College of Law", collectionId=2817, collectionName="Chicago-Kent College of Law"
Youtube Videos
web
eye 446,557
favorite 0
comment 0
Internet Archive crawldata from YouTube Video archiving project 2011, captured by crawl440.us.archive.org:youtube from Sat Jul 21 05:39:04 PDT 2012 to Fri Jul 20 23:46:43 PDT 2012.
Topic: crawldata
Wide Crawl started September 2012
web
eye 432,505
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl339.us.archive.org:wide from Fri Oct 19 01:17:49 PDT 2012 to Thu Oct 18 21:38:24 PDT 2012.
Topic: crawldata
Wide Crawl started January 2012
web
eye 401,204
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Thu Jan 5 18:18:33 PST 2012 to Thu Jan 5 11:30:47 PST 2012.
Topic: crawldata
Wide Crawl started January 2012
web
eye 389,217
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Thu Jan 5 20:28:12 PST 2012 to Thu Jan 5 13:25:29 PST 2012.
Topic: crawldata
Wide Crawl started January 2012
web
eye 353,407
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Thu Jan 5 16:52:35 PST 2012 to Thu Jan 5 10:17:00 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 352,922
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-11-22T06:44:42 UTC to 2012-11-22T18:14:08 UTC.
Topic: crawldata
Wide Crawl started January 2012
web
eye 348,211
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl420.us.archive.org:wide from Wed Jan 11 15:00:58 PST 2012 to Wed Jan 11 07:47:03 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 347,057
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Tue Jan 3 14:35:17 PST 2012 to Tue Jan 3 08:02:37 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 345,162
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Tue Jan 3 11:14:32 PST 2012 to Tue Jan 3 04:36:49 PST 2012.
Topic: crawldata
Wikipedia Outlinks February 2012
web
eye 342,361
favorite 0
comment 0
Internet Archive crawldata from wikipedia outbound links. captured by crawl435.us.archive.org:wpo from Thu Oct 18 19:09:44 PDT 2012 to Thu Oct 18 13:30:19 PDT 2012.
Topic: crawldata
NLS_2012
web
eye 340,065
favorite 0
comment 0
Internet Archive crawldata uploaded by selenium-101.us.archive.org:NLS-CRAWL-004 from Sat Jun 16 16:29:25 PDT 2012 to Wed Dec 12 11:44:11 PST 2012.
Topic: crawldata
Election Crawl 2012
web
eye 322,522
favorite 0
comment 0
Internet Archive crawldata uploaded by crawling119.us.archive.org:COL-ELECTION2012 from Fri Jun 29 21:24:09 PDT 2012 to Thu Jan 10 20:37:50 PST 2013.
Topic: crawldata
Live Web Proxy Crawls
web
eye 319,269
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-gen1.us.archive.org from 2012-06-26T02:05:01 UTC to 2012-06-26T10:40:25 UTC.
Topic: crawldata
Wide Crawl started January 2012
web
eye 316,072
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Thu Jan 5 15:53:45 PST 2012 to Thu Jan 5 08:51:54 PST 2012.
Topic: crawldata
Wide Crawl started January 2012
web
eye 313,829
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl427.us.archive.org:wide from Tue Apr 17 00:58:46 PDT 2012 to Mon Apr 16 20:37:31 PDT 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 309,516
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Wed Mar 7 23:01:46 PST 2012 to Wed Mar 7 17:53:26 PST 2012.
Topic: crawldata
Wide Crawl started April 2012
web
eye 300,778
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl422.us.archive.org:wide from Fri Jun 8 07:28:07 PDT 2012 to Fri Jun 8 02:31:40 PDT 2012.
Topic: crawldata
Archive Team
web
eye 297,947
favorite 0
comment 0
This item contains regular captures of Dutch news websites in screenshot and WARC format.Dit item bevat de homepages van Nederlandse nieuwswebsites als screenshot en in WARC-formaat.Websites:nos.nlteletekst.nos.nlrtlnieuws.nlnu.nltelegraaf.nlmetronieuws.nlspitsnieuws.nlvolkskrant.nlnrc.nltrouw.nlparool.nlfd.nlrefdag.nl
Live Web Proxy Crawls
web
eye 297,190
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live0.us.archive.org from 2012-12-14T21:29:12 UTC to 2013-01-23T21:42:32 UTC.
Topic: crawldata
Wide Crawl started January 2012
web
eye 282,780
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl336.us.archive.org:wide from Wed Jan 18 17:01:06 PST 2012 to Wed Jan 18 09:17:47 PST 2012.
Topic: crawldata
google.co.jp
web
eye 276,997
favorite 0
comment 0
Source: google.co.jp
Wikipedia Outlinks February 2012
web
eye 273,503
favorite 0
comment 0
Internet Archive crawldata from wikipedia outbound links. captured by crawl435.us.archive.org:wpo from Sat Feb 11 14:58:31 PST 2012 to Sat Feb 11 08:33:22 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 272,384
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-12-31T05:25:29 UTC to 2013-01-01T02:09:59 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 271,337
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Sun Feb 12 06:50:14 PST 2012 to Sun Feb 12 02:22:07 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 270,078
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live0.us.archive.org from 2012-12-18T16:45:10 UTC to 2012-12-19T09:20:10 UTC.
Topic: crawldata
Wide Crawl started January 2012
web
eye 263,354
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Thu Jan 5 19:32:22 PST 2012 to Thu Jan 5 12:20:54 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 261,987
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live0.us.archive.org from 2012-12-30T15:07:01 UTC to 2012-12-31T14:15:26 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 261,588
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-gen1.us.archive.org from 2012-07-30T04:16:42 UTC to 2012-07-30T11:28:26 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 261,444
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live0.us.archive.org from 2012-11-22T04:31:46 UTC to 2012-11-22T19:27:11 UTC.
Topic: crawldata
NLS_2012
web
eye 259,302
favorite 0
comment 0
Internet Archive crawldata uploaded by selenium-101.us.archive.org:NLS-CRAWL-004 from Wed Oct 31 04:28:48 PDT 2012 to Wed Dec 12 01:48:33 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 255,109
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-12-14T17:34:12 UTC to 2012-12-15T10:27:19 UTC.
Topic: crawldata
The Archive Team Just In Time Grabs
web
eye 247,542
favorite 1
comment 0
Live Web Proxy Crawls
web
eye 245,620
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-12-13T10:53:21 UTC to 2012-12-13T16:53:32 UTC.
Topic: crawldata
Archive Team
web
eye 244,865
favorite 0
comment 0
This item contains regular captures of Dutch news websites in screenshot and WARC format.Dit item bevat de homepages van Nederlandse nieuwswebsites als screenshot en in WARC-formaat.Websites:nos.nlteletekst.nos.nlrtlnieuws.nlnu.nltelegraaf.nlmetronieuws.nlspitsnieuws.nlvolkskrant.nlnrc.nltrouw.nlparool.nlfd.nlrefdag.nl
Live Web Proxy Crawls
web
eye 244,446
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Sun Jan 1 09:53:25 PST 2012 to Sun Jan 1 03:42:17 PST 2012.
Topic: crawldata
recurrence=NONE, maxDuration=86400, maxDocumentCount=null, isTestCrawl=false, seedCount=2185, patchForQaJobId=1036, accountId=413, organizationName="Michigan State University", collectionId=2356, collectionName="Colleges, Schools, Research Centers & Institutes Collection"
Live Web Proxy Crawls
web
eye 243,577
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Tue Feb 28 04:16:59 PST 2012 to Tue Feb 28 03:44:00 PST 2012.
Topic: crawldata
reddit.com
web
eye 240,027
favorite 0
comment 0
Source: reddit.com
Live Web Proxy Crawls
web
eye 239,884
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live0.us.archive.org from 2012-10-15T10:26:27 UTC to 2012-10-15T18:41:53 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 237,343
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-12-20T18:25:28 UTC to 2012-12-21T16:41:39 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 235,053
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-12-27T00:05:42 UTC to 2012-12-27T16:35:17 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 234,839
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Mon Mar 5 12:41:27 PST 2012 to Mon Mar 5 09:03:22 PST 2012.
Topic: crawldata
NLS_2012
web
eye 234,770
favorite 0
comment 0
Internet Archive crawldata uploaded by selenium-101.us.archive.org:NLS-CRAWL-004 from Sun Jul 8 09:00:15 PDT 2012 to Wed Dec 12 01:48:28 PST 2012.
Topic: crawldata
NLS_2012
web
eye 232,989
favorite 0
comment 0
Internet Archive crawldata uploaded by selenium-101.us.archive.org:NLS-CRAWL-004 from Thu Jun 14 21:41:14 PDT 2012 to Fri Jan 11 03:09:56 PST 2013.
Topic: crawldata
Live Web Proxy Crawls
web
eye 232,119
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live0.us.archive.org from 2012-12-14T22:44:21 UTC to 2012-12-15T05:16:24 UTC.
Topic: crawldata
ameblo.jp
web
eye 230,305
favorite 0
comment 0
Source: ameblo.jp
reddit.com
web
eye 228,420
favorite 0
comment 0
Source: reddit.com
Wide Crawl started April 2012
web
eye 228,266
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Sun May 13 07:09:39 PDT 2012 to Sun May 13 01:49:20 PDT 2012.
Topic: crawldata
Archive Team
web
eye 227,679
favorite 0
comment 0
This item contains regular captures of Dutch news websites in screenshot and WARC format.Dit item bevat de homepages van Nederlandse nieuwswebsites als screenshot en in WARC-formaat.Websites:nos.nlteletekst.nos.nlrtlnieuws.nlnu.nltelegraaf.nlmetronieuws.nlspitsnieuws.nlvolkskrant.nlnrc.nltrouw.nlparool.nlfd.nlrefdag.nl
Live Web Proxy Crawls
web
eye 227,034
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Tue Apr 24 15:08:33 PDT 2012 to Tue Apr 24 11:01:55 PDT 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 226,767
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-12-30T19:10:53 UTC to 2012-12-31T10:26:34 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 225,842
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-10-30T19:58:33 UTC to 2012-10-31T18:47:34 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 224,538
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-12-27T07:49:06 UTC to 2012-12-28T07:29:11 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 223,221
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-gen1.us.archive.org from 2012-07-15T10:45:03 UTC to 2012-07-15T18:48:12 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 222,670
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-12-28T19:46:00 UTC to 2012-12-29T15:05:14 UTC.
Topic: crawldata
Wide Crawl started April 2012
web
eye 222,027
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Thu May 3 12:36:06 PDT 2012 to Thu May 3 07:30:25 PDT 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 221,674
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-12-18T15:31:35 UTC to 2012-12-19T04:58:30 UTC.
Topic: crawldata
Wide Crawl started April 2012
web
eye 219,097
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl427.us.archive.org:wide from Fri May 25 03:27:07 PDT 2012 to Thu May 24 22:54:59 PDT 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 218,352
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-gen1.us.archive.org from 2012-07-27T10:56:04 UTC to 2012-07-27T17:15:08 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 218,164
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-10-13T05:37:57 UTC to 2012-10-13T20:05:35 UTC.
Topic: crawldata
ask.com
web
eye 216,831
favorite 0
comment 0
Source: ask.com
Live Web Proxy Crawls
web
eye 215,749
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-12-07T06:06:22 UTC to 2012-12-10T21:54:23 UTC.
Topic: crawldata